Blind Source Separation of Speech and Background Music for Improved Speech Recognition
نویسنده
چکیده
Consider the setup where one wants to apply ASR in the presence of background music, as is often the case with broadcast news or documentary programmes. The speech of these is of good quality, a human has no problem understanding it, but an ASR system (although very robust against moderate white noise) fails terribly when the noise is more structured, since music is disturbing the frequency spectrum in a selective way. Typical audio fragments of this type can be found at e.g. the news website of the BBC (http://news.bbc.co.uk/cbbcnews/) or the news flashes on VRT radio (see http://www.radio1.be) or NOS (see http://omroep.nl/nieuws), but also any documentary programme will add background music to most, if not all, commentary, since this gives the programme a “juicier” flavour. The type of music used for this purpose is such that it does not disturb human understanding of the spoken content and does not distract attention. Hence often the music consists of relatively long constant frequency tones, as can be seen on the spectrogram in Figure 1. In the example of Figure 1, an audio fragment of 31 seconds from a BBC documentary was sampled at 16 kHz and the FFT spectrum was calculated on 512 samples
منابع مشابه
A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments
A new speech enhancement scheme is presented integrating spatial and temporal signal processing methods for robust speech recognition in noisy environments. The scheme first separates spatially localized point sources from noisy speech signals recorded by two microphones. Blind source separation algorithms assuming no a priori knowledge about the sources involved are applied in this spatial pro...
متن کاملUsing Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings
Recent work in blind source separation applied to anechoic mixtures of speech allows for improved reconstruction of sources that rarely overlap in a time-frequency representation. While the assumption that speech mixtures do not overlap significantly in time-frequency is reasonable, music mixtures rarely meet this constraint, requiring new approaches. We introduce a method that uses spatial cue...
متن کاملBlind Source Separation of Real World Signals
We present a method to separate and deconvolve sources which have been recorded in real environments. The use of noncausal FIR filters allows us to deal with nonminimum mixing systems. The learning rules can be derived from different viewpoints such as information maximization, maximum likelihood and negentropy which result in similar rules for the weight update. We transform the learning rule ...
متن کاملA comparison of simultaneous 3-channel blind source separation to selective separation on channel pairs using 2-channel BSS
A number of real-life speech applications using BSS have been reported for two channel applications but only a few have been reported for multi-channel (more than 2 channels) applications. Moreover these mostly involve simulation studies or real-life separations in controlled settings. In this paper some practical problems of multichannel applications will be analyzed. A methodology is proposed...
متن کاملReal-Time Prototype for Integration of Blind Source Extraction and Robust Automatic Speech Recognition
This demo presents a real-time prototype for automatic blind source extraction and speech recognition in presence of multiple interfering noise sources. Binaural recorded mixtures are processed by a combined Blind/Semi-Blind Source Separation algorithm in order to obtain an estimation of the target signal. The recovered target signal is segmented and used as input to a real-time automatic speec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003